首页> 外文OA文献 >Spatio-Temporal Credit Assignment in Neuronal Population Learning
【2h】

Spatio-Temporal Credit Assignment in Neuronal Population Learning

机译:神经元人口学习中的时空学分分配

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.
机译:在从试验和错误中学习时,即使在结果不确定或受到延误的情况下,将动物的行为决策与特定环境的决策联系起来可能很困难,动物也需要将其与环境的增强联系起来。考虑学习的生物物理基础时,学分分配问题变得更加复杂,因为行为决定本身是由许多突触释放的时空聚集造成的。我们提出了一种可塑性诱导模型,用于在泄漏集成和火神经元种群中的增强学习,该模型基于级联的突触记忆轨迹。每个突触级联首先使突触前输入与突触后事件相关,然后与行为决定相关,最后与外部增强相关。对于操作员条件调整,即使延误的交付时间如此之长,以至于决策和相关奖励之间的时间连续性由于干预决策本身而受到延误而失去,学习也会成功。这表明该模型为时间信用分配提供了可行的机制。此外,学习随着人口规模的增加而加快,因此可塑性级联同时解决了在不同人口神经元中为突触分配功劳的空间问题。对其他任务的仿真(例如顺序决策)可以将所提出的方案的性能与基于时间差异的学习的性能进行对比。我们认为,由于它们的相对鲁棒性,突触可塑性级联是大脑中强化学习的有吸引力的基本模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号